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I. Introduction 


Scope 

The objective of research was to develop and validate new computational algorithms for 
solving the steady and unsteady Euler and Navier-Stokes equations. The end-products are 
new three-dimensional Euler and Navier-Stokes codes that are faster, more reliable, more 
accurate, and easier to use. 

Approach 

The three-dimensional Euler and full/ thin-layer Reynolds-averaged Navier-Stokes equa- 
tions for compressible/incompressible flows are solved on structured hexahedral grids. The 
Baldwin-Lomax algebraic turbulence model is used for closure. The space discretization is 
based on a cell-centered finite-volume method augmented by a variety of numerical dissi- 
pation models with optional total variation diminishing limiters. The governing equations 
are integrated in time by an implicit method based on lower-upper factorization and sym- 
metric Gauss-Seidel relaxation. The algorithm is vectorized on diagonal planes of sweep 
using two-dimensional indices in three dimensions. Convergence rates and the robustness 
of the codes are enhanced by the use of an implicit full approximation storage multigrid 
method. 

II. Previous Results 

Previous results obtained under this cooperative agreement were reported in MCAT Progress 
Reports submitted in Feb. ’93, Oct. ’92, Apr. ’92, and Oct. ’89. 

III. Current Work 

A new computer program named CENSSD-MG has been developed for compressible flows 
with discontinuities. Details of the code are described in Appendix A and Appendix B. 

IV. Summary and Conclusions 

The improved computational efficiency of the Navier-Stokes solvers by the use of a highly 
vectorizable and unconditionally stable algorithm makes fine grid calculations affordable. 
The resulting code is fast enough to be used as an aerodynamic design tool for advanced 
subsonic and supersonic transport aircrafts. 
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The multigiid method has been applied to an existing three-dimensional compressible Euler solver to 
accelerate the convergence of the implicit symmetric relaxation scheme. This lower-upper symmetric Gauss- 
Seidel implicit scheme is shown to be an effective multigrid driver in three dimensions. A grid refinement study 
is performed including the effects of large cell aspect ratio meshes. Performance figures of the present multigrid 
code on Cray computers including the new C90 are presented. A reduction of three orders of magnitude in the 
residual for a three-dimensional transonic inviscid flow using 920 k grid points is obtained in less than 4 min on 
a Cray C90. 


I. Introduction 

A LTHOUGH unstructured grid methods have been used 
successfully in solving the Euler equations for complex 
geometries, structured grid solvers still remain useful for the 
Navier-Stokes equations because of their natural advantages 
in dealing with the highly clustered meshes in the viscous 
boundary layers. Structured grid methods not only handle 
reasonably complex geometries using multiple blocks but also 
offer a hybrid grid scheme to alleviate difficulties which un- 
structured grid methods have often encountered. Recent de- 
velopments in structured grid solvers have been focused on 
efficiency, as well as accuracy, since most existing three-di- 
mensional Navier-Stokes codes are still not efficient enough to 
be used routinely for aerodynamic design. 

Multigrid methods have been useful for accelerating the 
convergence of iterative schemes. Efficient Euler codes have 
been developed by Jameson 1 using a full approximation stor- 
age method of Brandt 2 in conjunction with the explicit Runge- 
Kutta scheme. The explicit multigrid method has demon- 
strated impressive convergence rates by taking large time steps 
and propagating waves fast on coarse meshes. The explicit 
multigrid method has been extended to the Navier-Stokes 
equations by Martinelli . 3 Several explicit multigrid codes for 
the three-dimensional Navier-Stokes equations have been de- 
veloped successfully by Vatsa and Wedan , 4 Radespiel et al ., 5 
and Turkel et al . 6 

It does not seem to be profitable to consider an unfactored 
implicit scheme for a multigrid driver since the implicit scheme 
can take large time steps which are limited by the physics rather 
than the grid. However, the multigrid method can improve the 
convergence rates of factored implicit schemes in two dimen- 
sions as demonstrated by Yoon 7 and in Refs. 8-10. The im- 
plicit multigrid method has been implemented by Caughey 12 to 
the diagonalized alternating direction scheme , 11 and by Ander- 
son et al . 13 to the three-dimensional alternating direction 
scheme. Yoon 7 has introduced an implicit algorithm based on 
a lower-upper (LU) factorization and symmetric Gauss-Seidel 
(SGS) relaxation. The scheme has been used successfully in 
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computing chemically reacting flows due in part to the al- 
gorithm's property which reduces the size of the left-hand side 
matrix for nonequilibrium flows with finite rate chemis- 
try . 14-16 A recent study 17 shows that the three-dimensional 
extension of the method using a single grid requires less com- 
putational time than most existing codes on a Cray YMP 
computer. One of the objectives of the present work is to 
accelerate the convergence of the lower-upper symmetric 
Gauss-Seidel relaxation scheme in three dimensions by intro- 
ducing a multigrid technique. The performance of the code is 
demonstrated for an inviscid transonic flow past an ONERA 
M 6 wing on highly clustered grids. 

II, Governing Equations 

Let t be time; Q the vector of conserved variables; £, P, and 
(r the convective flux vectors; and P vt P v , and & v the flux 
vectors for the viscous terms. Then the three-dimensional 
Navier-Stokes equations in generalized curvilinear coordinates 
({, 17 , f) can be written as 

d,Q + K) + atf-K) + ^6 - 6.) = o (i) 

where the flux vectors are defined in Ref. 17. The Euler 
equations are obtained by neglecting the viscous terms. 

III. Lower-Upper Symmetric Gauss-Seidel 
Implicit Scheme 

An unfactored implicit scheme can be obtained from a 
nonlinear implicit scheme by linearizing the flux vectors about 
the previous time step and dropping terms of the second and 
higher order. 

[/ + aAt(D ( A + Dji + D£)]6Q = - A tk (2) 

where the residual k is 

k = D { £ + £>,£• + (3) 

and I is the identity matrix. The correction Q" * 1 - Q" is SQ, 
where n denotes the time level. Z>$, D n , and Z) r are difference 
operators that approximate and respectively. A , B y 

and £are the Jacobian matrices of the convective flux vectors. 
For a = Vi, the scheme is second-order accurate in time. For 
other values of a, the time accuracy drops to first order. 

An efficient implicit scheme can be derived by combining 
the advantages of LU factorization and Gauss-Seidel relaxation, 

LD-'m§=-±tk <4) 
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Here, 

L = / + aAt(Df A + + D~b* + Df -A~ 

D = / + aAr(^4 + - >4 “ + + £* - £*“) (5) 

U ® / + c*Ar(D f M ' 4- + /y £- + £-) 

where D ? ~, Af, and D r ~ are backward difference operators, 
while D£ , D ? + , and are forward difference operators. 

In the framework of the LU-SGS algorithm, a variety of 
schemes can be developed by different choices of numerical 
dissipation models and Jacobian matrices of the flux vectors. 
Jacobian matrices leading to diagonal dominance are con- 
structed so that + matrices have non-negative eigenvalues 
whereas - matrices have nonpositive eigenvalues. For example, 

A ± = Ti/ifTf 1 

= ( 6) 

C ± = A * f- 1 

where typically and Tf 1 are similarity transformation ma- 
trices of the eigenvectors of A . Another possibility is to con- 
struct the Jacobian matrices of the flux vectors approximately 
which yield diagonal dominance. 

A ± = Vi[A ± p(A)I) 

b* = Vi[d±p{b)i\ ( 7) 

£* = Vi[ t±p(£)i\ 

where 

p(A) = k max[IX(/4)l] (8) 

Here X(^4) represents eigenvalues of the Jacobian matrix A , 
and k is a constant that is greater than or equal to 1 . A typical 
value of k is 1. However, stability and convergence can be 
controlled by adjusting k as the flowfield develops. 

IV. Vectorization 

The algorithm is completely vectorizable on / +y + k « 
const diagonal planes of sweep. This is achieved by reordering 
the arrays. 

point, /plane) = £{/, y, k) (9) 

where /plane is the serial number of the diagonal plane, and 
/point is the address on that plane. The number of diagonal 
planes is given by 

rt plane = / max + ymax + £max - 5 (10) 

with the maximum vector length of 

n point = (ymax - \)+(kmax - 1) (1 1) 


V. Numerical Dissipation 

The cell-centered finite-volume method is augmented by a 
numerical dissipation model based on blended first- and third- 
order terms. 18 * 21 The finite volume method is based on the 
local flux balance of each mesh cell. For example, 

+ dr^ + dfi = £, + 4 ViJ, k 

+ Kj *Vi.k * PiJ - k + &i, J, k +■ '/: - &i,j, k - Vi ( 12 ) 


Here, the conserved variables are averaged at the ceil faces to 
evaluate the fluxes. The dissipative flux d is added to the 
convective flux in a conservative manner. 

(£< + Vi.j, k ~ V: , j, k + Fi\j + l/ i , k ~ £ i.j - k 

+ k + ! /j ~ k - j) — ( dj + Vl j % k — d, _ i k 

4* dij + i/C i> k ~ d { J _ y ti k 4- dj j t k + [ /i ~ d t j t k - i /j) (13) 

For simplicity, d l+ VltJik is denoted by d i+Vk hereafter. 

dj + Zi = f I i (Oi + \ ~ &i) 

- lAQi + i- 3 + 1+3 ft - Or - i) (14) 

The coefficients of the dissipative terms are the directionally 
scaled spectral radii of the Jacobian matrices. The use of 
directional scaling provides anisotropic dissipation to each 
direction, resulting in improved performance on meshes with 
high aspect ratio cells. Third-order terms formed from fourth 
differences provide the background damping. First-order 
terms are added by second differences near shock waves under 


the control of a 

sensor 


where 

h* v: = max(^ i + v,) 

(15) 


^ = max(vf , vf) 

(16) 

"? = ip, + i 

- 2 Pi + A - 1 1 / ( A . i + 2 a + a_ i) 

(17) 

*! =17-,., 

- 27} + 7",_ ,l/( 7}+ , + 27} + 7}_ ,) 

(18) 


Here p and T are the pressure and the temperature. The 
low-order dissipative coefficient is proportional to the sensor 
v as 


«®w = «‘ 2 H4),- + w (19) 

where r(A) denotes the spectral radius of the Jacobian matrix 
A. The high-order dissipative coefficient is controlled by the 
sensor. 

*1 * = max[0,K ,4 V(/4), . » - e® , 4 ] (20) 

Here k {2) and * (4) are constants which are different from other 
k in Eq. (8). 

Dissipative terms for the coarse grids in the multigrid levels 
are formed from second differences with constant coefficients. 

Since the constant total enthalpy is not preserved in general 
except for the Euler equations, the dissipation for the energy 
equation is based on the total energy rather than the total 
enthalpy. To reduce the amount of dissipation in the direction 
normal to the body surface inside boundary layers, Swanson 
and Turkel 20 provided additional scaling by multiplying a 
spectral radius in the normal direction by a function of the 
local Mach number. Although the Mach number scaling tech- 
nique may improve the accuracy of the Navier-Stokes solu- 
tions, it has not been used for the Euler solutions. It has been 
shown that the convergence rate on high cell aspect ratio 
meshes can be enhanced by multiplying MartinelliV scaling 
factor based on local cell aspect ratio to the dissipative coeffi- 
cients. However, this technique has not been used here be- 
cause the aspect ratio based scaling factor seems to compro- 
mise the accuracy of the solution. 22 

VI. Multigrid Method 

In the present multigrid method, part of the task of tracking 
the evolution of the solution is transferred through a sequence 
of successively coarser meshes. The use of larger control vol- 
umes on the coarser meshes tracks the large-scale evolution. 
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with the consequence that global equilibrium can be more 
rapidly attained. This evolution on the coarse grid is driven by 
the solution of the fine grid equations. The solution vector on 
a coarse grid is initialized as 

Qu = ZShQh/Sv, ( 21 ) 

where the subscripts denote values of the mesh spacing param- 
eter h> S is the cell volume, and the sum is over the eight cells 
of the fine grid which compose each cell of the coarse grid. 
After updating the fine grid solution, the values of the con- 
served variables are transferred to the coarse grid using Eq. 
(21). The pressure is calculated on the coarse grid using the 
transferred variables. Then a forcing function is defined as 

Pi, = £*»«?») - ( 22 ) 

The residual on the coarse grid is given by 

Rlh - Rlh(Qlh) + Plh (23) 

For the next coarser grid, the residual is calculated as 

R?h = R*h(Q*h) + P*h (24) 

where 

Pah = ZRik - *4A«2ff) (25) 



Streamwise Grid Index i 


Fig. 1 Distribution of cell aspect ratios (CAR) for the low CAR grid 
(140 k grid points). 




Fig. 3 Distribution of cell aspect ratios for the high CAR grid (920 k 
grid points). 



Fig. 4 Convergence histories for the Euler and Navier-Stokes equa- 
tions. 


The process is repeated on successively coarser grids. Multiple 
iterations can be done on each coarse grid. Finally, the correc- 
tion calculated on each grid is interpolated back to the next 
finer grid. Let Q 2h be the final value of (? 2 a resulting from both 
the correction calculated on grid 2h and the correction trans- 
ferred from the grid 4 h. Then 

Qx = Qx + /&(&* - Q$!) (26) 

where Q h is the solution on grid h before the transfer to the 
grid 2/i, and / is a trilinear interpolation operator. Since the 
evolution on a coarse grid is driven by residuals collected from 
the next finer grid, the final solution on the fine grid is 
independent of the choice of boundary conditions on the 
coarse grids. 

VII. Results 

To investigate the effectiveness of the full approximation 
storage multigrid method in conjunction with the lower-upper 
symmetric Gauss-Seidel relaxation scheme, transonic flow cal- 
culations have been carried out for an ONERA M6 wing. The 
freestream conditions are at a Mach number of 0.8395, and a 
3.06-deg angle of attack. Since this is an unseparated flow 
case, only the solution of the three-dimensional Euler equa- 
tions is considered. 





All of the calculations are performed without the aid of a 
couple of conventional techniques which have been crucial to 
ensure the robustness of the multigrid method. The aspect 
ratio based scaling factor for the numerical dissipation is not 
employed because of the reason stated in Sec. V. The enthalpy 
damping technique, which is not valid for the Navier-Stokes 
equations, is not used here. 

The treatment of the far-field boundary condition is based 
on the Riemann invariants for a one-dimensional flow normal 
to the boundary. 

To study the effects of mesh cell aspect ratios on the conver- 
gence rates, a low cell aspect ratio 129 x 33 x 33 C-H grid 
(140,481 points) is used first. Figure 1 shows the distribution 
of geometric cell aspect ratios of the first normal mesh cells at 
the body from the leading edge to the downstream boundary. 
Although geometric aspect ratios are less accurate than spec- 
tral radius based aspect ratios, they seem to be useful. The cell 
aspect ratios are of the order of 1 at the wing surface. Figure 
2 shows convergence histories of the root-mean-squared resid- 
uals which correspond to the density corrections. The residu- 
als are normalized by their initial values. The chained line 
indicates the single grid convergence history. The dashed line 
indicates the multigrid convergence history using a four-level 

V cycle with one iteration on each coarse grid. The solid line 
indicates the multigrid convergence history using a four-level 

V cycle with two iterations on each coarse grid. Clearly, the 



Fig. 5 C p at 44°7o span for ONERA M6 wing. 





two iteration strategy converges faster than the single iteration 
cycle. Although not shown here, the three-iteration cycle does 
not improve the performance due to a slight increase in work 
per cycle. Hence, all of the following multigrid calculations 
have been performed using the V cycle with two coarse grid 
iterations. Four orders of residual drop requires 1022 and 156 
iterations for single and multigrid, respectively. The conver- 
gence has been accelerated by a factor of 6.5 by the use of the 
multigrid method on this low ceil aspect ratio grid. 

A high cell aspect ratio 289 x 65 x 49 C-H grid (920,465 
points) gives a severe test to the code. Here 65 grid points are 
used in the normal direction. The distance of the first normal 
grid point from the wing surface is 2.0 x 10~ 6 times the chord 
length. Figure 3 shows that the cell aspect ratios on the wing 
reach as high as 10,000. This grid will be known as the fine 
grid. Two more grids are prepared for a grid refinement study. 
A 145 x 33 x 25 medium grid (119,625 points) is generated by 
eliminating every other grid point from the fine grid. A 
73 x 17 x 13 coarse grid (16,133 points) is generated from the 
medium grid using the same process. Figure 4 shows single 
grid convergence histories of both the Euler and the Navier- 
Stokes equations on the medium grid. For the present attached 
flow case, there is not much difference between the two. This 
suggests that the convergence rate is controlled not by the 
equations but by the grid. 

Figure 5 shows good pressure coefficient agreement between 
experimental data 23 and the multigrid computations on the 
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Tabic 1 Performance on Cray computers (single processor) 


Computer 

Grid 

CPU a 

Mflops 

Note 

YMP 

Single 

7.4 

170 

Euler 

YMP 

Multiple 

11.1 

160 

Euler 

C90 

Single 

3.1 

410 

Euler 

C90 

Multiple 

4.7 

390 

Euler 


‘CPU us per grid point per iteration. 



Fig. 10 Convergence histories on the fine grid (CPU time on a Cray 
YMP). 


fine grid at the 44 semispan station. Single grid convergence 
histories are compared in Fig. 6 and show that the convergence 
rate slows down significantly as the number of grid points 
increases. Figure 7 shows that multigrid convergence histories 
of the three grids are almost identical. Here, the coarse, 
medium, and fine grids use two-, three-, and four-level V 
cycles, respectively. The results demonstrate grid independent 
convergence rates which are achieved by the multigrid method. 

To study the effect of grid stretching on the convergence, 
another fine grid whose first grid distance is 2.0 x 10~ 5 is 
generated. Figure 8 shows that the present numerical method 
is insensitive to grid clustering despite the fact that cell aspect 
ratios differ by an order of magnitude. The convergence rate 
on the highly stretched grid appears to be slightly better than 
the less stretched one for this case. Although not shown here, 
the rate of convergence slows down significantly after the 



residual drops about five orders of magnitude on the highly 
stretched grid. 

Figure 9 shows that the convergence on the multigrid is 
about 6.5 times faster than on the single grid in terms of 
iterations. However, a multigrid cycle requires more time per 
iteration than a single grid cycle not only because of additional 
operations for transfer and interpolation but because of the 
short vector lengths on the coarse grids. As presented in Table 
1, the overhead for the multigrid cycle is approximately 50 °/o. 
Nevertheless, the present CENS3D code requires less than 5 ps 
per grid point per iteration on a Cray C90 computer. With the 
single grid mode, the code needs only 3 /as at the sustained rate 
of 410 Mflops. The CPU times on a Cray YMP and a Cray 
C90 are compared in Figs. 10 and 11, respectively. The CPU 
time is reduced by a factor of four by the use of the multigrid 
method. Figure 11 shows that the residuals for the multigrid 
drop three orders of magnitude in 50 iterations or 4 CPU min 
and four orders in 160 iterations or 13 CPU min on a Cray 
C90. 

Conclusions 

An efficient three-dimensional multigrid code has been de- 
veloped for inviscid compressible flows. The lower-upper sym- 
metric Gauss-Seidel implicit scheme is shown to be an effective 
multigrid driver in three dimensions. The present numerical 
method appears to be insensitive to grid refinement or to 
highly clustered grids with high mesh cell aspect ratios. Grid 
independent convergence rates are achieved by the multigrid 
method. A reduction of three orders of magnitude in the 
residual for a three-dimensional transonic inviscid flow using 
920 k grid points is obtained in less than 4 min on a Cray C90. 
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Abstract 

The LU-SGS (lower upper symmetric Gauss Seidel) 
algorithm has been implemented into the Compressible 
Navier- Stokes, Finite Volume (CNSFV) code and validated 
with a multizonal Navier-Stokes simulation of a transonic 
turbulent flow around an Onera M6 transport wing. The 
convergence rate and robustness of the code have been im- 
proved and the computational cost has been reduced by 
at least a factor of 2 over the diagonal Beam- Warming 
scheme. 


Introduction 


The numerical simulation of the Navier-Stokes equa- 
tions about complex and realistic aerodynamic configu- 
rations with a structured grid requires the use of zonal 
methods. A popular and fairly efficient numerical scheme 
used to solve the Navier-Stokes equations is the diagonal 
implicit Beam- Warming algorithm [1,2]. A finite volume 
multi-zonal Navier-Stokes code, CNSFV, was developed 
using this scheme. However the Beam- Warming scheme 
uses approximate factorization to simplify the matrix in- 
version process. As a result the convergence properties of 
the scheme are dependent on the time step and time step 
scaling chosen, and much user input is required to deter- 
mine the optimum time step and time step scaling. For a 
numerical simulation of a realistic aerodynamic configura- 
tion. several dozen zones may be required. Choosing the 
optimum time step and scaling for each of these zones be- 
comes a tedious and time consuming process. For reasons 
not understood yet. the finite volume formulation of the 
diagonal Beam-VVarming scheme suffers a severe time step 
restriction. Typically for constant time steps, the largest 
CFL number that could be exercised were less than ten 
and much less than that for the initial transients. With a 
judicious time step scaling, the maximum CFL can be as 
high as 50 to /5. The time step restriction is not so severe 
with the finite difference formulation of the diagonal Beam- 
Warming scheme, where the maximum CFL numbers can 
be as high as 500. The time step restriction for the finite 
volume form of the diagonal Beam- Wanning scheme is se- 
vere enough that the time step is too small to be practical 
for an unsteady Navier-Stokes code. 
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A numerical scheme without the above mentioned 
time step restrictions is the LU-SGS (lower upper symmet- 
ric Gauss Seidel) algorithm [3,4]. This scheme has been im- 
plemented into the CNSFV code and validated with a mul- 
tizonal Navier-Stokes simulation of a transonic flow around 
an Onera M6 transport wing. 


Navier-Stokes Equations 

The three-dimensional thin-layer Navier-Stokes equa- 
tions in strong conservation law form in curvilinear coor- 
dinates are 

d r VQ + diF + dvG + d<;H 
= Rt~ 1 (d i F v + d^G v + d c H v ) (1) 

where 
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The contravariant velocity components axe defined as 


U = V({ t 4- £ r u 4- £ y i/ 4 £,u>), 
y = V(*lt + r/rU 4 r) v v 4- rj x w), (3) 

W ~ V(G 4* C xU + 4 Gw) 

and the viscous fluxes axe given by 
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H v = ^V 
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m 0 = (u 2 4 v 2 4 tu 2 )/2 4 (Pr(7 - 1)) } (a 2 ) } 

m 2 = ({*«* 4 Zyvc 4 G^e)/ 3 ' 

m 3 = + *7y + Ts, (5) 

m 4 = (r7 z u n 4 r) y v v 4 rjgWjf)/ 3. 

^5 = Cx 2 + C y 2 4C 2 , 

m 6 = (Cx^c + (y v c + Cr^c)/ 3 

The pressure is given by the equation of state 


P = (7 - 1)0 ~ ?0 2 4 i> 2 4 w 7 )/ 2) (6) 

where 7 is the ratio of specific heats. The sound speed 
is denoted by a. The nondimensional parameters are the 
Reynolds number Re and the Prandtl number Pr. The 
coefficient of viscosity p and thermal conductivity k are 
decomposed into laminar and turbulent contributions as 
follows: 


/i = /if 4 fit 


k _ V _ f Ml , (h_\ 

Pr Pr \Pr, Pr.) 

where Pr\ and Pr t are the laminar and turbulent Prandtl 
numbers and k = pi with the nondimensionalization used in 
this paper. The standard Baldwin- Lomax turbulent eddy 
viscosity model [5] is chosen for this study. 

The nondimensional parameters chosen for this code 
are the same as those in the CNS code [6]. Denoting di- 
mensional quantities with a (), the normalizing parameters 
are the freestream density poo, the freestream sound speed 
fioo, the freestream viscosity coefficient and a charac- 
teristic length /. 

The metrics used above have a different meaning for a 
finite volume formulation compared to the finite difference 
formulation of [6]. Referring to a typical finite volume cell 
as shown in figure 1, the finite volume metrics are defined 
(see. for example, Vinokur [7]) as 


s ;+ i “ s :,;+^ s y4ij4 f k 
= oI( rT " r< ) x ( r * ~ r <) + ( r 3 - r <) * (f7 - r„)] 


s *+i = + 5 y ,i + lj + 

= ^[( r 7 - r 2 ) x (r 3 - r 2 ) + (r« - r 2 ) x (r : - r 2 )] 


s /+ i = s t<l+ p + s v<l+i j + s 2j +±k (7) 

= ^[( r 7 - r 5 ) X (r 6 - r s ) + (r e - r 5 ) x (r T - r 5 )] 

The finite volume metrics represent the cell face area 
normals in each of the curvilinear coordinates (£,*7,0* 
They are related to the metrics introduced in equations 
(1 - 5) as follows 

tziV = S ziJ+\ 


(S) 


Cr = 5 Zi(/+i 

where x, = x,y, z for t = 1,2,3. The volume of the com- 
putational cell is given by 


v - g[( r 4 “ r 2) x (r 8 - ri) • (r T - r 8 ) 


4(r 2 - r B ) x (r® - ri) • (r 7 - r e ) 

4(r e - r 4 ) x (r 8 - r x ) • (r 7 - r 8 )] (9) 

and is the finite volume equivalent of the inverse Jacobian 
of the coordinate transformation in the finite difference 
formulation of [6]. 

Numerical Method 

The governing equations are integrated in time for 
both steady and time accurate calculations. The unfac- 
tored linear implicit scheme is obtained by linearizing the 
fiux vectors about the previous time and dropping second 

and higher order terms. The resulting scheme in finite 

volume form is given by 


\I+V- 1 h(6 ( A n +6 n B n +6 <: C n -Re- 1 (6 ( L n +6 v M n +6 < N n ))) 


■AQ n = -V~ 1 hR n (10) 
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where the residual R n is 


P = T-‘T ( 


R n = [6'F n +6 v G n + 6 <: H n -Rc- 1 (S i F?~6„G*+6 < HZ)} 

(11) 

The convective flux jacobians A, B, C and the viscous flux 
jacobians L, M and N are defined in the appendix of [1], 

Solving the above equation set by a direct matrix in- 
version is still not practical for three dimensional problems. 
However there are several indirect or approximate methods 
available, including the diagonal Beam- Wanning scheme 
and the lower- upper (LU) factorized scheme of Yoon and 
Jameson. In a previous study [8,9], the diagonal Beam- 
Warming scheme was the basic flow solver algorithm for 
the multi-zonal compressible Navier- Stokes finite volume 
code, CNSFV. Wliile the scheme worked well for finite dif- 
ference codes, the performance deteriorated for the finite 
volume code. The cause of the deterioration is not known, 
but it is not isolated to the CNSFV code; several other 
finite volume codes using ADI schemes seem to suffer from 
the same type of deterioration. Because of this problem, a 
LU scheme which does not seem to suffer the same problem 
as the ADI schemes is incorporated into the CNSFV code. 
A brief description of both the diagonal Beam- Warming 
and the LU scheme is presented. 


Diagonal Beam- Warming Algorithm 

With the use of approximate factorization and diag- 
onalization of the flux Jacobian matrices, a scalar pentadi- 
agonai algorithm [2] can be derived from eqn (10) as 


Tc[I - F” 1 W«A { jiV[7 - V'" 1 h6„A n )P[I -p V 1 h6 ( A<] 


■T < ~ l ±Q n = R n (12) 

where 6 f is a central difference operator and &Q n = 

Q n ^ i - Q n with Q nJrl = Q(t n + h). The viscous terms 
are not included in the implicit side. The artificial dissi- 
pation is included in both sides and is derived below. 

The inviscid flux jacobians are diagonalized as fol- 
lows: 

d Q F = A = TjtAjT" 1 
d Q G = B = Tr,.\ n T- 1 
d Q H = C = 

The Tj . T,. T- are the eigenvector matrices of A. B, C. 
respectively with A*. A,, A; as the respective eigenvalues 

We also have 


.v = 17% 


Each of the factors of the implicit operator of equa- 
tion (12) has an artificial dissipation term added to sta- 
bilize the central difference operator. The added term is 
based on Jameson s nonlinear second and fourth order dis- 
sipation and for the ^-operator takes the form 


hV~ l Ve [* ;+t (e (2) A, • - e (4 > A { Vf A e -)]A<2 n (13) 


with 
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= mai(0,/c 4 — (16) 

where *2, are constants of o(l), and A Ve are *ke f° T ‘ 
ward and backward difference operators. <7j+ ^ is a modi- 
fied spectral radius defined as 


= a j + *>+ 1 


d; = <7 } (l -f ymax(ajt/o'j,^//^j)), (IT) 


<r ; = [\U\ + aJal+**+s*l (18) 

and where cell centered surface areas are used, e.g. 


The modified form of the spectral radius, equation 
(IT), is suggested by Turkel [10] to account for large aspect 
ratio computational cells as for example in a viscous layer. 
Similar dissipation terms are obtained for the 77- and (- 
operators. The dissipation terms added to the right hand 
side of equation (10) are identical to those given above 
except that AQ n is replaced by Q n . 


LU-SGS Scheme 

The unfactored scheme of eqn (10) is given in terms 
of central differences for the implicit operator. It can also 
be represented in terms of upwinded differences. Dropping 



(23) 


the viscous terms in the implicit operator, equation (10) is 
in terms of upwind differences as follows: 


; I - V - 1 kST A- -67 A- +s; B" -6; B~ + 8+C~)\ 

•A<?" = -V~ 1 kR n (20) 

where 5f and 57 axe the forward and backward difference 

opewiots--Sirnilarly, A+ and A~ are the Jacobian matrices 
which contain non- negative and non-positive eigenvalues, 
respectively. 

The Yoon and Jameson version of the LU scheme can 
be obtained from the above equation by a simple reordering 
of the matrix elements and approximately factoring into 

two matrices. Define D, Z, and U to be matrices which 
contain the diagonal, sub-diagonal and super- diagonal el- 
ements of the implicit operator of equation (20), respec- 
tively. 

[D — L + U}AQ n = -V~ l hR n 

which can be factored into 


[D-rL\D- l [D + U}<\Q n = -V~ l hR n 

Redefine L = D 4- L and U = D + U to yield the final form 
of the LU-SGS scheme. 


LD~ 1 U AQ n = - V- l hR n (21) 

where 

L = I + V~ l h(67A+ + 6-B+ 4- 6~C+ - A" - B~ - CT) 

D = r+V~ l h(A+ - .4“ + B* - B~ 4-C + -C") (22) 

U = 7 4- V- l h{6jA~ + 6* B~ 4- 4- A+ + B+ + C + ) 

A variety of LU-SGS schemes can be obtained by differ- 
ent choices of the numerical dissipation models and Jaco- 
bian matrices of the flux vectors. In this study we use the 
same artificial dissipation described for the diagonal Beam- 
Warming scheme. For robustness and to ensure that the 
scheme converges to a steady state, the matrices should 
be diagonally dominant. To ensure diagonal dominance, 
the Jacobian matrices can be constructed so that 4- ma- 
trices have nonnegative eigenvalues and — matrices have 
nonpositive eigenvalues. For example, the diagonalization 
used for the Beam-W'arming scheme can be used to obtain 
the ± matrices. 


a ± = r^vfrr 1 


b ± = T.Atr; 1 

C ± = r c Af 7T 1 

Another method to obtain diagonal dominance is to con- 
struct approximate Jacobian matrices: 

A ± = [A±p(A)I}/2 

B* = {B±p(B)I}/2 (24) 

C ± = [C±p(C)I\/2 

where p(A) = max[|A(A)|] and represent a spectral 
radius of the Jacobian matrix A with the eigenvalues 
A (A). These eigenvalues are, e.g., A(A) = [U, U, U, U ± 

4- sj 4- where the metric terms are again de- 
fined at cell centers by Eq. (19). Other methods of increas- 
ing the diagonal dominance of the LU operator include the 
addition some approximation of the viscous terms and the 
artificial dissipation in the implicit operator [11]. 

The inversion of equation (21) is done in three steps. 
The block inversion along the diagonal is eliminated if the 
approximate Jacobian of Eqn (24) are used instead of Eqn 
(23). A Newton- type of iteration is obtained simply by set- 
ting h = oo. Eliminating the time step from the algorithm 
provides the practical advantage of side-stepping the need 
to find an optimal CFL number or the time step scaling to 
reduce the overall computational effort to achieve steady 
state solutions. As mentioned previously, this is an im- 
portant consideration for a muitizonal code. If first-order 
one-sided differences are used, Eq. (22) reduces to 




D = pl (25) 

U = pi ~ AJ +l k l + B~ k+l l + C~ k l+1 

where 

p = p{A) + p(B) + p{C) 

This algorithm requires only scalar diagonal inversions 
since the diagonal of L or U = D. The true Jacobians 
matrices of Eq. (23) may permit better convergence rates, 
but require block diagonal inversions with approximately 
twice the computational effort per iteration. This study 
considers the scalar version without the viscous or artifi- 
cial dissipation only. 

The scheme is completely vectorizable on j 4* k 4- / = 
constant oblique planes of sweep, Fig. 2. This can be 
achieved by reordering the three dimensional arrays into 
two-dimensional arrays as follows: 
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P(i r ,p) = P(j,k,l) 

where i p is the serial number of the oblique plane of sweep 
and p is the address of point (j, k , /) in that plane. The 
specific value of p in terms of (j, k , /) is 


and the number of points in the plane p is 


n P = " 1 ) 


+ 1)(1 + Wt0n(l,p “ jmax ” 1)) 

-r(p — kmax)(p ~~ *max 4“ 1 )( 1 4 t«STyn(l, p — kmax “ 1)) 
4(P - lmax){p ~ Imax + 1)(1 4- t«Styn(l,p - l mai - 1)) 

(p ^mai ^mai )(p 1'ma: Lai 1) 

’(1 4“ i$*0n(l,p — kmax Lai 1)) 

~(P j max ~ ^max )(p “ jmax Lai 4" 1) 

*(1 4* zsi<jrn(l T p — j ma x “ Lai — 1)) 

— (p jmax ~~ k ma z)(p “ jmax Lai 4* 1) 

*( 1 T I5t<77l( 1, p Jmai — Lai 1))] 

The maximum number of points in the planes is n p>max = 
rc p (p^), where p^ = (p mflZ 4- l)/2. The vector lengths 

change from 1 at the beginning of the inversion sweep to a 
maximum of n PfTn<lx at the center of the sweep and back to 
1 at the end. While n Pffnax can be large, the vector lengths 
at the beginning and end of the sweep sure very small and 
inhibit efficient vectorization. Nevertheless for reasonably- 
sized zones , n ?tmaz is sufficiently large so that good vec- 
torization can be achieved. Other reordering sequences are 
possible, but additional sweeps will then be necessary and 
the overall performance of the LU-SGS inversion is less. In 
addition, the oblique planes of sweep allow for very efficient 
autotasking procedures on multi- pro cess or computers; the 
efficiency is less with other sweep directions. 


Boundary Conditions and Zonal Interfaces 

To complete the equation set boundary conditions 
must be specified. With the use of curvilinear coordi- 
nates the physical boundaries have been mapped into com- 
putational boundaries, which simplifies the application of 
boundary conditions. The boundary conditions to be im- 
plemented for external viscous or inviscid flows include (1) 
inflow or fax field. (2) outflow. (3) inviscid and (4) viscous 


impermeable wall, and (5) symmetry conditions. For ex- 
ternal three-dimensional flow fields about closed bodies, 
the topology of the grid usually introduces (6) grid singu- 
larities which require special boundary conditions. The use 
of zonal methods can avoid the generation of grid singu- 
larities, but requires (7) special zonal interface boundary 
conditions. For compressible flows these zonal boundary 
conditions must be conservative to maintain global con- 
servation. In this study we consider nonconservative in- 
terfaces. However, conservative interzonal communication 
has been developed and reported in a previous study [8]. 

A patched zonal procedure is used for the grid sys- 
tem. The zonal interfaces are general three-dimensional 
surfaces. The zoning procedure has a general capability 
so that a face of one zone can be in contact with several 
other zones. Information is transfered between zones with 
a bilinear interpolation procedure. The search method for 
determining the interpolation coefficients is automatic in 
that no user input is required other than identifying the 
zonal faces in contact with each other. The grid topology 
of the various zones is arbitrary and the zonal interfaces 
are not, in general, mesh continuous. In this study we use 
the ghost cell concept to obtain boundary and interface 
fluxes. Although there is no grid overlap at the zonal in- 
terfaces, the cell-centered conserved variables are reflected 
across surface and zonal boundaries and are determined by 
interpolating from the the adjacent zonal values. To deter- 
mine the dissipative fluxes for the fourth-order dissipation 
requires two ghost cells. However, here we simply reduce 
the fourth-order dissipation to second-order so that only 
one ghost cell is needed. 

Results 

The test case chosen to validate the LU-SGS scheme 
and the code is the attached turbulent transonic flow over 
an ONERA M6 wing. This case has good experimen- 
tal data [12] available and has been used extensively for 
Navier- Stokes code validation. The reason for choosing 
an attached flow case instead of a separated flow is that 
the emphasis of this study is to validate the flow solver and 
not the turbulence model. For attached flows the Baldwin- 
Lomax model [5] is sufficiently accurate. The flow condi- 
tons are Moo = 0.84, a = 3.06°, Re mac = 11.72 x 10 6 . The 
wing surfaces are adiabatic and the fluid is a perfect gas 
with 7 = 1.4. The computational grid consists of a C-0 
mesh split into four zones. There are two zones around the 
wing, one each on the upper and lower surface of the wing. 
Similarly, the wake region is split into an upper and lower 
grids. This grid, in the single zone version, seems to be the 
standard grid used for various validations of Navier-Stokes 
codes, see, e.g., ref. 13. Figure 3 shows a partial view of 
the grids and the zones. The total grid size is 193 x 34 x 49 
points, and the individual zones consist of 73 x 34 x 49 and 
25 x 34 x 49 points for the wing and wake zones, respec- 
tively. The wall normal grid spacing at the surface is on 
the order of y + = 4. This is sufficient to properly resolve 
the boundary layer. 

The converged results in terms of pressure contours 
on the upper surface of the wing are compared with ex- 
perimental data [12} axe shown in Fig. 4. The agreement 
between computation and experiment is good for the four 
span stations shown. These results are comparable to the 
results reported in ref. 13. The numerical predicted shocks 
are slightly smeared as compared to experiment. This is to 
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be expected since the numerical scheme uses centred differ- 
encing with nonlinear scalar dissipation. Increased mesh 
resolution or an upwind scheme will sharpen the shock 
wave structure. 

Figure 5 presents the performance comparison be- 
tween the diagonal Beam- Warming scheme and the LU- 
SGS scheme. These views show the convergence rate for 
the upper wing zone with the two schemes. The first view 
shows that the LU-SGS scheme is not that much faster 
than the diagonal Beam- Warming scheme in terms of iter- 
ations required to reach convergence. However, in terms of 
cpu times, the LU-SGS scheme is at least twice as efficient, 
as shown in the second view. 

Most of the gains in efficiency with the LU-SGS 
scheme is in the reduced operation count. Table 1 shows 
the unit cpu time (fistc/ gridpoint/ iteration) for both 
schemes, as measured on the NAS A- Ames Cray C90 com- 
puter. The unit cpu time for the right hand side (RHS) in- 
cludes the calculation required for the RHS fluxes, bound- 
ary conditions, interface interpolation, swapping data 
in/out of machine core for each zone at each time step, 
writeout of convergence data, and other miscellaneous op- 
erations. This unit time is the same for both schemes at 
4.5^sec. The left hand side (implicit) unit cpu time is sub- 
stantially different. The times are 5.9 nsec for the diagonal 
Beam- Wanning and 2.2 fistc for the LU-SGS scheme. The 
total times are 6.7/isec and 10.4/j.sec for the LU-SGS and 
diagonal Beam- Warming schemes, respectively. Further- 
more, the diagonal Beam- Wanning scheme required user 
intervention to obtain the optimum time step size as well 
as the time step scaling. The LU-SGS scheme uses no time 
step scaling and an infinitely large time step as explained 
in LU-SGS section above. 

Conclusions 

A multi-zonal compressible Navier- Stokes code has 
been improved by replacing the implicit diagonal Beam- 
Warming algorithm with the lower-upper symmetric Gauss 
Seidel scheme. With the new scheme the code is now much 
more robust, requires no user intervention to determine the 
optimum time step or time step scaling, converges faster, 
and requires only half the cpu time to obtain the same level 
of convergence. Most of the gain in efficiency is due to the 
lower operation count required by the LU-SGS algorithm. 
The algorithm and code have been validated with a tur- 
bulent simulation of an attached transonic flow around an 
Onera M6 wing. 
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Schema 

CNSFV - dlag BW 

CNSFV - tUSGS 

Tout 

10.4m sec (340 mflop) 

6.7 peec (330 mflop)* 

LHS 

5.9ii3ec 

2^psec 


xilnv 

lusgs 


etainv 

(oblique 


zetainv 

planes of 


v pen la, etc 

sweeps) 

RHS, 

4.Spsec 

4.5mS*c 

etc. 




♦ average vector length ^70, optimum length ^128 


Table 1. Perftrace performance comparison of diag- 
onal Beam- Warming and LU-SGS schemes on Cray C90 
computer, (unit cpu time ^sec/grid point /iteration) 
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Fig. 1. Finite volume cell nomenclature. 


Fig. 2. Oblique plane of sweep for vectorization. 



Fig. 3. Four zone C-0 Grid for the ONERA M6 
wing; only the two zones on the upper surface of wing and 
wake are shown. 
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Fig. 4. Comparison of predicted and experimental 
[12] C p distribution on the ON'ERA M6 wing at four span 
locations. Turbulent transonic flow at M ^ = 0.84, a = 
3.06, and Re ma c = 11.72 x 10 6 . Baldwin-Lomax turbulent 
algebraic eddy viscosity model (5j used for the numerical 
prediction. 
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Fig. 5. Comparison of the convergence rates ver- 
sus iteration and cpu times between the diagonal Beam- 
Warming and LU-SGS schemes. 
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